Skip to content

Conversation

Jack-Khuu
Copy link
Contributor

@Jack-Khuu Jack-Khuu commented Sep 8, 2025

There are no functional change

This PR does some sanitization and moves some the function/classes in app/grpo into more appropriate files. Specifically:

  • reference_agent.py => reference_model.py
  • reference_agent.TitanRefModel => reference_model.ReferenceModel
  • grpo.RefModel => reference_model.HFReferenceModel
  • compute_logits + Episode => trainer.py

Spinning this up as a separate PR to make reviewing functional changes to RefModel clearer down the line

  • Also deletes unused code in reference_model.py

 python apps/grpo/main.py

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 8, 2025
logger.setLevel(logging.DEBUG)


def compute_logprobs(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the implementation here is functionally equivalent to the implementation previously in reference_actor.py (which was moved to trainer.py)

return logprobs


@dataclass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO keep this top level in grpo/main, at the very least don't put it in the trainer file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on where to store Episode down the line?
I'm fine using "Episode" before then

@pbontrager
Copy link
Contributor

I don't actually think we should do this yet. I think we should just delete these as we have importable versions. I don't want the HF version to disperse into the codebase and I don't want us to abstract over different backends. We should be extremely lazy about moving things out of main.

@Jack-Khuu
Copy link
Contributor Author

Quick responses!

Noted, iceboxing this

@Jack-Khuu Jack-Khuu closed this Sep 8, 2025
@Jack-Khuu Jack-Khuu reopened this Sep 8, 2025
@Jack-Khuu Jack-Khuu marked this pull request as draft September 8, 2025 16:04
@Jack-Khuu Jack-Khuu changed the title Disperse GRPO app function/classes into respective files [Icebox] Disperse GRPO app function/classes into respective files Sep 8, 2025
@Jack-Khuu Jack-Khuu closed this Oct 2, 2025
@Jack-Khuu Jack-Khuu deleted the cleanup-grpo branch October 2, 2025 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants